#Pivot to wider format for principal component analysis df_feature_wide\<- df_sample_wise \|\>mutate(intensity =unname(intensity)) \|\>pivot_wider(id_cols =c(cell_type,replicate_n),names_from ='protein_groups', values_from ='intensity') \|\>select(where(\~!any(is.na(.)))) #Selects only columns without N/A#Get numerical inputs df_input \<- df_feature_wide \|\>select(-c('replicate_n','cell_type'))#Group datapoints by cell types colourby \<-pull(df_feature_wide, 'cell_type')#Projecting data unto principal components (pc) pc \<-prcomp(df_input, center =FALSE, scale =FALSE)
PCA Results
Volcano - augmentation
calculation of mean value and standard deviation
switching to wider dataframe (better for computation of fold and T test)
uniprot_lookup <-function(gene_id, dataframe, id_column, keyword_column){# does lookup in uniprot df and returns Keywordsreturn(Keywords)}df_type <- df_norm_intensities |>filter(!is.na(genes) & genes !="") |>mutate(description_column =map_chr(.x = genes,.f =~uniprot_lookup(gene_id=.x, dataframe=df_uniprot_mouse, id_column=`Gene Names`, keyword_column=Keywords) ))
Discussion
Data may take many shapes and forms
TidyverseR is nice and structured, but can also be restricting.
Some functions necessary like the t.test is for BaseR and not TidyR - making it difficult.
Principal Component Analysis showed cell differentiation
The PCA in this case can be used to identify the cell differentiation pattern
This is in line with what was presented in literature for this data.
Volcano plot and lookup to find overexpressed proteins
Through the volcano plots we can filter and identify the truly overexpressed proteins, which can then be looked up and studied for biological significance.
While we don’t concldue any high level biological understanding, we showcase the possibility of using TidyverseR to extrapolate biological information
Conclusion
While we have been going through many internal frustrations, we have been able to…
We have created code that is able to load, tidy and transform and visualize data containing 3700 observations across 5 cell types - and extrapolating biological meaning.
Create 2 functions that can create dataframes for volcanoplots, and 1 lookk up function to annotate keywords for protein groups.
The project has been successful in it’s main intend, which is to showcase a pipeline for understanding cell differentiation.
For future projects/studies, a higher number of observations and cell-types could be included to increase the resolution. The overall pipeline can be used with other cell-types - like humans.